{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YHK6DyunSbs4"
   },
   "source": [
    "# Ungraded Lab: Using more sophisticated images with Convolutional Neural Networks\n",
    "\n",
    "In Course 1 of this Specialization, you saw how to use a Convolutional Neural Networks (CNNs)  to make recognition of computer generated images of horses and humans more efficient. In this lesson, you'll take that to the next level: building a model to classify real images of cats and dogs. Like the horses and humans dataset, real-world images also come in different shapes, aspect ratios, etc. and you will need to consider these when preparing your data.\n",
    "\n",
    "In this lab, you will review how to build CNNs, preprocess the data with the `tf.data` API, and examine your results. You'll follow these steps:\n",
    "\n",
    "1.   Explore the example data of `Dogs vs. Cats`\n",
    "2.   Build and train a neural network to classify between the two pets\n",
    "3.   Evaluate the training and validation accuracy\n",
    "\n",
    "You will build upon your results here in the next labs so you can improve it, particularly in avoiding overfitting. Let's begin!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UY6KJV6z6l7_"
   },
   "source": [
    "## Inspect the Dataset\n",
    "\n",
    "You will start by looking at the dataset. It consists of of 2,000 JPG pictures of cats and dogs. It is a subset of the [\"Dogs vs. Cats\" dataset](https://www.kaggle.com/c/dogs-vs-cats/data) available on Kaggle, which contains 25,000 images. You will only use 2,000 of the full dataset to decrease training time for educational purposes.\n",
    "\n",
    "First import the necessary libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import random\n",
    "import numpy as np\n",
    "from io import BytesIO\n",
    "\n",
    "# Plotting and dealing with images\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.image as mpimg\n",
    "\n",
    "import tensorflow as tf\n",
    "\n",
    "# Interactive widgets\n",
    "from ipywidgets import widgets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The contents are already saved to the base directory `./cats_and_dogs_filtered`, which contains `train` and `validation` subdirectories for the training and validation datasets (you can ignore `vectorize.py` in the output in the next cell).\n",
    "\n",
    "If you recall, the **training set** is the data that is used to tell the neural network model that 'this is what a cat looks like' and 'this is what a dog looks like'. The **validation set** is images of cats and dogs that the neural network will not see as part of the training. You can use this to test how well or how badly it does in evaluating if an image contains a cat or a dog. (See the [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/validation/check-your-intuition) if you want a refresher on training, validation, and test sets.)\n",
    "\n",
    "These subdirectories in turn each contain `cats` and `dogs` subdirectories. You can assign each of these directories to a variable so you can use it later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Mp39PPeAETY8"
   },
   "outputs": [],
   "source": [
    "BASE_DIR = '/tf/cats_and_dogs_filtered'\n",
    "\n",
    "train_dir = os.path.join(BASE_DIR, 'train')\n",
    "validation_dir = os.path.join(BASE_DIR, 'validation')\n",
    "\n",
    "# Directory with training cat/dog pictures\n",
    "train_cats_dir = os.path.join(train_dir, 'cats')\n",
    "train_dogs_dir = os.path.join(train_dir, 'dogs')\n",
    "\n",
    "# Directory with validation cat/dog pictures\n",
    "validation_cats_dir = os.path.join(validation_dir, 'cats')\n",
    "validation_dogs_dir = os.path.join(validation_dir, 'dogs')\n",
    "\n",
    "\n",
    "print(f\"Contents of base directory: {os.listdir(BASE_DIR)}\")\n",
    "\n",
    "print(f\"\\nContents of train directory: {train_dir}\")\n",
    "\n",
    "print(f\"\\nContents of validation directory: {validation_dir}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LuBYtA_Zd8_T"
   },
   "source": [
    "Now see what the filenames look like in the `cats` and `dogs` `train` directories. The file naming conventions are the same in the `validation` directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "4PIP1rkmeAYS"
   },
   "outputs": [],
   "source": [
    "train_cat_fnames = os.listdir(train_cats_dir)\n",
    "train_dog_fnames = os.listdir(train_dogs_dir)\n",
    "\n",
    "print(f\"5 files in cats subdir: {train_cat_fnames[:5]}\")\n",
    "print(f\"5 files in dogs subdir: {train_dog_fnames[:5]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HlqN5KbafhLI"
   },
   "source": [
    "Check the total number of cat and dog images in the `train` and `validation` directories:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "H4XHh2xSfgie"
   },
   "outputs": [],
   "source": [
    "print(f'total training cat images: {len(os.listdir(train_cats_dir))}')\n",
    "print(f'total training dog images: {len(os.listdir(train_dogs_dir))}')\n",
    "\n",
    "print(f'total validation cat images: {len(os.listdir(validation_cats_dir))}')\n",
    "print(f'total validation dog images: {len(os.listdir(validation_dogs_dir))}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C3WZABE9eX-8"
   },
   "source": [
    "For both cats and dogs, you have 1,000 training images and 500 validation images.\n",
    "\n",
    "Now take a look at a few pictures to get a better sense of what the cat and dog datasets look like. You can re-run the cell to see a fresh batch each time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Wpr8GxjOU8in"
   },
   "outputs": [],
   "source": [
    "# Parameters for your graph; you will output images in a 4x4 configuration\n",
    "nrows = 4\n",
    "ncols = 4\n",
    "\n",
    "# Set up matplotlib fig, and size it to fit 4x4 pics\n",
    "fig = plt.gcf()\n",
    "fig.set_size_inches(ncols * 4, nrows * 4)\n",
    "\n",
    "next_cat_pix = [os.path.join(train_cats_dir, fname)\n",
    "                for fname in random.sample(train_cat_fnames, k=8)]\n",
    "\n",
    "next_dog_pix = [os.path.join(train_dogs_dir, fname)\n",
    "                for fname in random.sample(train_dog_fnames, k=8)]\n",
    "\n",
    "for i, img_path in enumerate(next_cat_pix+next_dog_pix):\n",
    "    # Set up subplot; subplot indices start at 1\n",
    "    sp = plt.subplot(nrows, ncols, i + 1)\n",
    "    sp.axis('Off') # Don't show axes (or gridlines)\n",
    "\n",
    "    img = mpimg.imread(img_path)\n",
    "    plt.imshow(img)\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BQhDdYPEZvJt"
   },
   "source": [
    "It may not be obvious from looking at the images in this grid but an important note here is that these images come in all shapes and sizes (just like the 'horses or humans' dataset). So before training a neural network with them, you'll need to tweak the images. You'll see that in the next sections."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5oqBkNBJmtUv"
   },
   "source": [
    "## Building a Small Model from Scratch\n",
    "\n",
    "To train a neural network to handle the images, you'll need them to be in a uniform size. You will choose 150x150 pixels for this, and you'll see the code that preprocesses the images to that shape shortly.\n",
    "\n",
    "You can define the model by importing Tensorflow and using the Keras API. Here is the entire code first then the discussion comes after. This is very similar to the models you built in Course 1.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "QJ0E-Y7VN3gW"
   },
   "outputs": [],
   "source": [
    "model = tf.keras.models.Sequential([\n",
    "    # Rescale the image. Note the input shape is the desired size of the image: 150x150 with 3 bytes for color\n",
    "    tf.keras.Input(shape=(150, 150, 3)),\n",
    "    tf.keras.layers.Rescaling(1./255),\n",
    "    # Convolution and Pooling layers\n",
    "    tf.keras.layers.Conv2D(16, (3,3), activation='relu'),\n",
    "    tf.keras.layers.MaxPooling2D(2,2),\n",
    "    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),\n",
    "    tf.keras.layers.MaxPooling2D(2,2),\n",
    "    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),\n",
    "    tf.keras.layers.MaxPooling2D(2,2),\n",
    "    # Flatten the results to feed into a DNN\n",
    "    tf.keras.layers.Flatten(),\n",
    "    # 512 neuron hidden layer\n",
    "    tf.keras.layers.Dense(512, activation='relu'),\n",
    "    # Only 1 output neuron. It will contain a value from 0-1 where 0 for one class ('cats') and 1 for the other ('dogs')\n",
    "    tf.keras.layers.Dense(1, activation='sigmoid')\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_FiCc93hN6xe"
   },
   "source": [
    "You defined a [Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) model as before. The first layer is the [Rescaling](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Rescaling) layer. Note the `input_shape` parameter. Here is where you put the `150x150` size and `3` for the color depth because you have colored images.\n",
    "\n",
    "Images that go into neural networks are usually normalized to make them more amenable to processing by the network. In this case, your model will rescale the pixel values from being in the `[0, 255]` range into the `[0, 1]` range. If you've taken Course 1 of this Specialization, you might remember rescaling as part of the data pipeline (i.e. with a `.map` method of the `tf.data.Dataset`). Here, you used a different approach where this process happens within the model itself. This is usually the preferred way so you don't need to remember rescaling your images. The model does it for you.\n",
    "\n",
    "Next, you added some convolutional and pooling layers to extract features. Then, you will flatten the result to feed into the densely connected layers\n",
    "\n",
    "Because you are facing a two-class classification problem, i.e. a *binary classification problem*, you will end the network with a [*sigmoid* activation](https://wikipedia.org/wiki/Sigmoid_function). The output will be a single scalar between `0` and `1`, encoding the probability that the current image is class `1` (as opposed to class `0`).\n",
    "\n",
    "You can review the architecture of the network with the `model.summary()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7ZKj8392nbgP"
   },
   "outputs": [],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DmtkTn06pKxF"
   },
   "source": [
    "The `output_shape` column shows how the size of your feature map evolves in each successive layer. The convolution operation removes the outermost pixels from the original dimensions, and each pooling layer halves it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "PEkKSpZlvJXA"
   },
   "source": [
    "Next, you'll configure the specifications for model training. You will train the model with the `binary_crossentropy` loss, because it's a binary classification problem and your final activation is a sigmoid. You will use the `rmsprop` optimizer with an initial learning rate of `0.001`. The [RMSprop optimization algorithm](https://wikipedia.org/wiki/Stochastic_gradient_descent#RMSProp) automates learning-rate tuning for you. Other optimizers, such as [Adam](https://wikipedia.org/wiki/Stochastic_gradient_descent#Adam) and [Adagrad](https://developers.google.com/machine-learning/glossary/#AdaGrad), also automatically adapt the learning rate during training, and would work equally well here.\n",
    "\n",
    "During training, you will want to monitor classification accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "8DHWhFP_uhq3"
   },
   "outputs": [],
   "source": [
    "model.compile(\n",
    "    optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.001),\n",
    "    loss='binary_crossentropy',\n",
    "    metrics = ['accuracy']\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Sn9m9D3UimHM"
   },
   "source": [
    "### Data Preprocessing\n",
    "\n",
    "Next step is to set up the `tf.data.Dataset` objects that will read pictures in the source folders and feed them (with their labels) to the model. You'll have one for the training images and another for the validation images. These will yield batches of images of size 150x150 and their labels (binary).\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "mAbVnfbTNSFb"
   },
   "outputs": [],
   "source": [
    "# Instantiate the Dataset object for the training set\n",
    "train_dataset = tf.keras.utils.image_dataset_from_directory(\n",
    "    train_dir,\n",
    "    image_size=(150, 150),\n",
    "    batch_size=20,\n",
    "    label_mode='binary'\n",
    "    )\n",
    "\n",
    "# Instantiate the Dataset object for the validation set\n",
    "validation_dataset = tf.keras.utils.image_dataset_from_directory(\n",
    "    validation_dir,\n",
    "    image_size=(150, 150),\n",
    "    batch_size=20,\n",
    "    label_mode='binary'\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oSlV0K9qtOw0"
   },
   "source": [
    "You will then chain a few more methods to optimize the datasets for training and validation. As a refresher:\n",
    "* [cache()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cache) stores elements in memory as you use them so it will be faster to retrieve if you need them again\n",
    "* [shuffle()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle), as the name suggests, shuffles the dataset randomly. A `buffer_size` of 1000 means it will first select a sample from the first 1,000 elements, then keep filling this buffer until all elements have been selected.\n",
    "* [prefetch()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#prefetch) gets elements while the model is training so it's faster to feed in new data when the current training step is finished. A `buffer_size` set to `tf.data.AUTOTUNE` dynamically sets the number of elements to prefetch during runtime."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "OaQGr14UNlt0"
   },
   "outputs": [],
   "source": [
    "SHUFFLE_BUFFER_SIZE = 1000\n",
    "PREFETCH_BUFFER_SIZE = tf.data.AUTOTUNE\n",
    "\n",
    "train_dataset_final = train_dataset.cache().shuffle(SHUFFLE_BUFFER_SIZE).prefetch(PREFETCH_BUFFER_SIZE)\n",
    "validation_dataset_final = validation_dataset.cache().prefetch(PREFETCH_BUFFER_SIZE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "mu3Jdwkjwax4"
   },
   "source": [
    "### Training\n",
    "You will now train on all 2,000 images available, for 15 epochs, and monitor the accuracy as well on the 1,000 images in the validation set.\n",
    "\n",
    "Do note the values per epoch. You'll see 4 -- Loss, Accuracy, Validation Loss and Validation Accuracy.\n",
    "\n",
    "The `loss` and `accuracy` are great indicators of progress in training. `loss` measures the current model prediction against the known labels, calculating the result. `accuracy`, on the other hand, is the portion of correct guesses.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Fb1_lgobv81m"
   },
   "outputs": [],
   "source": [
    "history = model.fit(\n",
    "    train_dataset_final,\n",
    "    epochs=15,\n",
    "    validation_data=validation_dataset_final,\n",
    "    verbose=2\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "o6vSHzPR2ghH"
   },
   "source": [
    "### Model Prediction\n",
    "\n",
    "Now take a look at actually running a prediction using the model. This code will allow you to choose 1 or more files from your file system, upload them, and run them through the model, giving an indication of whether the object is a cat or a dog."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the widget and take care of the display\n",
    "uploader = widgets.FileUpload(accept=\"image/*\", multiple=True)\n",
    "display(uploader)\n",
    "out = widgets.Output()\n",
    "display(out)\n",
    "\n",
    "def file_predict(filename, file, out):\n",
    "    \"\"\" A function for creating the prediction and printing the output.\"\"\"\n",
    "    image = tf.keras.utils.load_img(file, target_size=(150, 150))\n",
    "    image = tf.keras.utils.img_to_array(image)\n",
    "    image = np.expand_dims(image, axis=0)\n",
    "    \n",
    "    prediction = model.predict(image, verbose=0)[0][0]\n",
    "    \n",
    "    with out:\n",
    "        if prediction > 0.5:\n",
    "            print(filename + \" is a dog\")\n",
    "        else:\n",
    "            print(filename + \" is a cat\")\n",
    "\n",
    "\n",
    "def on_upload_change(change):\n",
    "    \"\"\" A function for geting files from the widget and running the prediction.\"\"\"\n",
    "    # Get the newly uploaded file(s)\n",
    "    \n",
    "    items = change.new\n",
    "    for item in items: # Loop if there is more than one file uploaded  \n",
    "        file_jpgdata = BytesIO(item.content)\n",
    "        file_predict(item.name, file_jpgdata, out)\n",
    "\n",
    "\n",
    "uploader.observe(on_upload_change, names='value')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-8EHQyWGDvWz"
   },
   "source": [
    "### Visualizing Intermediate Representations\n",
    "\n",
    "To get a feel for what kind of features your CNN has learned, one fun thing to do is to visualize how an input gets transformed as it goes through the model.\n",
    "\n",
    "You can pick a random image from the training set, and then generate a figure where each row is the output of a layer, and each image in the row is a specific filter in that output feature map. Rerun this cell to generate intermediate representations for a variety of training images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-5tES8rXFjux"
   },
   "outputs": [],
   "source": [
    "# Define a new Model that will take an image as input, and will output\n",
    "# intermediate representations for all layers in the previous model\n",
    "successive_outputs = [layer.output for layer in model.layers]\n",
    "visualization_model = tf.keras.models.Model(inputs = model.inputs, outputs = successive_outputs)\n",
    "\n",
    "# Prepare a random input image from the training set.\n",
    "cat_img_files = [os.path.join(train_cats_dir, f) for f in train_cat_fnames]\n",
    "dog_img_files = [os.path.join(train_dogs_dir, f) for f in train_dog_fnames]\n",
    "img_path = random.choice(cat_img_files + dog_img_files)\n",
    "img = tf.keras.utils.load_img(img_path, target_size=(150, 150))  # this is a PIL image\n",
    "x = tf.keras.utils.img_to_array(img) # Numpy array with shape (150, 150, 3)\n",
    "x = x.reshape((1,) + x.shape) # Numpy array with shape (1, 150, 150, 3)\n",
    "\n",
    "# Run the image through the network, thus obtaining all\n",
    "# intermediate representations for this image.\n",
    "successive_feature_maps = visualization_model.predict(x)\n",
    "\n",
    "# These are the names of the layers, so you can have them as part of our plot\n",
    "layer_names = [layer.name for layer in model.layers]\n",
    "\n",
    "# Display the representations\n",
    "for layer_name, feature_map in zip(layer_names, successive_feature_maps):\n",
    "\n",
    "    if len(feature_map.shape) == 4:\n",
    "\n",
    "        #-------------------------------------------\n",
    "        # Just do this for the conv / maxpool layers, not the fully-connected layers\n",
    "        #-------------------------------------------\n",
    "        n_features = feature_map.shape[-1]  # number of features in the feature map\n",
    "        size = feature_map.shape[1]  # feature map shape (1, size, size, n_features)\n",
    "\n",
    "        # Tile the images in this matrix\n",
    "        display_grid = np.zeros((size, size * n_features))\n",
    "\n",
    "        #-------------------------------------------------\n",
    "        # Postprocess the feature to be visually palatable\n",
    "        #-------------------------------------------------\n",
    "        for i in range(n_features):\n",
    "            x = feature_map[0, :, :, i]\n",
    "            x -= x.mean()\n",
    "            x /= x.std ()\n",
    "            x *=  64\n",
    "            x += 128\n",
    "            x = np.clip(x, 0, 255).astype('uint8')\n",
    "            display_grid[:, i * size : (i + 1) * size] = x # Tile each filter into a horizontal grid\n",
    "\n",
    "        #-----------------\n",
    "        # Display the grid\n",
    "        #-----------------\n",
    "        scale = 20. / n_features\n",
    "        plt.figure(figsize=(scale * n_features, scale))\n",
    "        plt.title(layer_name)\n",
    "        plt.grid(False)\n",
    "        plt.imshow(display_grid, aspect='auto', cmap='viridis')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tuqK2arJL0wo"
   },
   "source": [
    "You can see above how the pixels highlighted turn to increasingly abstract and compact representations, especially at the bottom grid.\n",
    "\n",
    "The representations downstream start highlighting what the network pays attention to, and they show fewer and fewer features being \"activated\"; most are set to zero. This is called _representation sparsity_ and is a key feature of deep learning. These representations carry increasingly less information about the original pixels of the image, but increasingly refined information about the class of the image. You can think of a CNN (or a deep network in general) as an information distillation pipeline wherein each layer filters out the most useful features."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Q5Vulban4ZrD"
   },
   "source": [
    "### Evaluating Accuracy and Loss for the Model\n",
    "\n",
    "You will plot the training/validation accuracy and loss as collected during training:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "0oj0gTIy4k60"
   },
   "outputs": [],
   "source": [
    "def plot_loss_acc(history):\n",
    "    '''Plots the training and validation loss and accuracy from a history object'''\n",
    "    acc = history.history['accuracy']\n",
    "    val_acc = history.history['val_accuracy']\n",
    "    loss = history.history['loss']\n",
    "    val_loss = history.history['val_loss']\n",
    "    \n",
    "    epochs = range(len(acc))\n",
    "    \n",
    "    fig, ax = plt.subplots(1,2, figsize=(12, 6))\n",
    "    ax[0].plot(epochs, acc, 'bo', label='Training accuracy')\n",
    "    ax[0].plot(epochs, val_acc, 'b', label='Validation accuracy')\n",
    "    ax[0].set_title('Training and validation accuracy')\n",
    "    ax[0].set_xlabel('epochs')\n",
    "    ax[0].set_ylabel('accuracy')\n",
    "    ax[0].legend()\n",
    "    \n",
    "    ax[1].plot(epochs, loss, 'bo', label='Training Loss')\n",
    "    ax[1].plot(epochs, val_loss, 'b', label='Validation Loss')\n",
    "    ax[1].set_title('Training and validation loss')\n",
    "    ax[1].set_xlabel('epochs')\n",
    "    ax[1].set_ylabel('loss')\n",
    "    ax[1].legend()\n",
    "    \n",
    "    plt.show()\n",
    "\n",
    "plot_loss_acc(history)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DgmSjUST4qoS"
   },
   "source": [
    "As you can see, the model is **overfitting** like it's getting out of fashion. The training accuracy (in blue) gets close to 100% while the validation accuracy (solid line) stalls at 70%. The validation loss reaches its minimum after only five epochs.\n",
    "\n",
    "Since you have a relatively small number of training examples (2000), overfitting should be the number one concern. Overfitting happens when a model exposed to too few examples learns patterns that do not generalize to new data, i.e. when the model starts using irrelevant features for making predictions. For instance, if you, as a human, only see three images of people who are lumberjacks, and three images of people who are sailors, and among them the only person wearing a cap is a lumberjack, you might start thinking that wearing a cap is a sign of being a lumberjack as opposed to a sailor. You would then make a pretty lousy lumberjack/sailor classifier.\n",
    "\n",
    "Overfitting is the central problem in machine learning: given that you are fitting the parameters of our model to a given dataset, how can you make sure that the representations learned by the model will be applicable to data it has never seen before? How do you avoid learning things that are specific to the training data?\n",
    "\n",
    "In the next exercise, you'll look at ways to prevent overfitting in this classification model."
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "private_outputs": true,
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0rc1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
